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Adaptive Elastic Net Method for Cox Model 


Chunhong Li 1 2 3 4 , Xinxing Wei 1,3 , Hongshuai Dai 2,4 


Abstract 

In this paper, we study the Adaptive Elastic Net method for the Cox model. We 
prove the grouping effect and oracle property of its estimators. Finally, we show these 
two properties by an empirical analysis and a numerical simulation, respectively. 
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1. Introduction 

The aim of survival analysis is usually to identify risk factors and their risk contributions. 
Often, many covariates are collected and then a large parametric model is built. Hence, to 
efficiently select a subset of significant variables upon which the hazard function depends 
becomes an important and challenging task. Recently, many scholars used the variable 
selection techniques in studying linear regression models to deal with this kind of problems. 
For more information on this, see Fan and Li [6] and the references therein. However the 
treatment of using these methods directly also causes some problems. To overcome these 
drawbacks, statisticians have recently proposed a family of penalized partial likelihood 
methods to study the survival data, such as the Lasso method and so on. 

The Lasso method introduced by Tibshirani m is a penalized least squares method 
imposing a penalty on the regression coefficients. Due to the nature of the penalty, the 
Lasso method does both continuous shrinkage and automatic variable selection simultane¬ 
ously. However, the Lasso estimator does not possess the oracle property and instability 
with high-dimensional data. Hence, Zou m proposed the Adaptive Lasso method, which 
has the oracle property. Namely, the true regression coefficients that are zero are auto¬ 
matically estimated as zero, and the remaining coefficients are estimated as well as if the 
correct submodel were known in advance. Contrast to the Lasso and Adaptive Lasso, the 
Elastic Net method proposed by Zou and Hastie m is particularly useful when the num¬ 
ber of predictors is much bigger than that of observations. In addition, the Elastic Net 
method encourages a grouping effect, which means that strongly correlated predictors tend 
to be in or out of the model together. However, Fan and Li [5J, [6] stated that the estimator 
of the Elastic Net method does not have the oracle property. Hence, Zou and Zhang m 
proposed the Adaptive Elastic Net method, which has the oracle property and grouping 
effect. 
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The Cox model[2] is a classical method to deal with survival data. It is well-known that 
the Elastic Net method for the Cox model has grouping effect and can deal with the highly 
correlated data. Inspired by these facts and Zou and Zhang p3j, in this paper, we study 
the Adaptive Elastic Net method for the Cox model and show that it has the grouping 
effect and oracle property. 

The rest of this paper is organized as follows. In Section 2, we introduce the Adaptive 
Elastic Net method for the Cox model. Section 3 is devoted to studying the grouping 
effect. The oracle property is discussed in Section 4. In Section 5, we show these two 
properties by an empirical analysis and a numerical simulation. 


2. Adaptive Elastic Net method 

In this section, we give the definition of the Adaptive Elastic Net method for the Cox 
model. We first recall some known facts about the Cox model. Recall that the hazard 
function for an individual at the failure time t is 


h(t) = h 0 (t)exp{/3 T X}, (2.1) 

where ho(t) is a baseline hazard function, /3 = , (3 P ) T is the regression vector of 

unknown coefficients, X is the covariate of an individual. Let Ri denote the risk set at 
time ti — 0, that is the set of individuals who have not failed or been censored by that time. 
Furthermore, let Xj = (xj\, ■ ■ ■ ,Xj p ) T denote the value of X for the jth individual and AT* 
the value for the individual failing at time L. Suppose a random sample of n individuals 
is chosen, then the likelihood function for inference about f3 is given by: 

ex p(ELi 0 ^*) 

Therefore, the log-likelihood function is 


TL y p 

PkXik ~ In [ Z exp E j’ 


i =1 k =1 


i&Ri 


k= 1 


( 2 . 2 ) 


By maximizing (12.21) . we can get the estimator of /3. 

From Tibshirani [10], and Fan and Li [6j, by minimizing the opposite number of (12.21) 
first, and then adding the appropriate penalty, we can get the Elastic Net estimator for 
the Cox model: 


f 1 n 1 

P(EN) = argmin < Z { _ ^ Xi + ln [ Z exp (P TX i) } + A ill/^lli + A 2||/3|| 2 k(2.3) 


2=1 


j&Ri 


where Ai > 0 and A 2 > 0 are regularization parameters, and 

p p 


and 
3 =1 


ii — Z 

3 = 1 


2 





Adaptive Elastic Net Method 


3 


Moreover, (12.31) can be rewritten as 


( \ n v P 

P(EN) = argmin ^E{-E /3 k x ik + In [E exp (E j 1 


i= 1 fc=l 
V 


j&Ri k =1 


+Al.y |/?fc| + -^2 ^/?fc (■ 


k =1 


fc=1 


(2.4) 


Following Zou and Zhang m , we introduce the Adaptive Elastic Net estimator for the 
Cox model as follows. 


Definition 2.1 The Adaptive Elastic Net estimator P^aen ) = \J3(aen)h ''' iP(AEN) t 
for the Cox model is defined by 


j&Ri k= 1 


P(AEN) = arg min CE{-E PkXik + In [E exp (E Pk X j /c ) j ^ 

U i =1 k =1 

+a* +A2 y pi\, 

k= 1 

where = (|/3 (£at)J ) -7 with 7 > 0. 


fc=i 


(2.5) 


3. Grouping effect 

In this section, we study the grouping effect of the Adaptive Elastic Net method for 
the Cox model. Before we state the main result, we need the following notation. Let 
Xa = ( xia , ■■■ ,x na ) and x b = ( x lb , ••• ,x nb ), a,b = l,--- ,p, be highly correlated, and 
/3 a (A]\ A2) and /3b(AJ, A2) denote the estimators of the ath variable X a and the 6th variable 
X b in the covariate, respectively. Following the notation in Andersen and Gill [1], let 
Yi(t) = I(Ti > t, Ci > t), Ni(t ) = I(Ti < t, Ti < Ci ), Si = I(Ti < Ci) and Z t = min{Tj, Q}. 
Then, 

Theorem 3.1 For the Cox model, given the data ( Zi,Si,Xi ) and parameter (A^, A2) , the 
responses are centered and the predictors are standardized. Let /3(A^,A2) be the Adaptive 
Elastic Net estimator. Suppose that 


p a ( aj,a 2 )/9 6 (a;,A2)>o. 


(3.1) 


Define 


D \t,\ 2 (a,b) 




then 


D\* t \ 2 ( a , b) -A 0, 

which means that D\*^\ 2 (a,b) approximates to 0. 
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Proof: By (13.ID . we have 


sgn{/3 a (Ai,A 2 )} = sgn{/3 b (At,A 2 )} 


and 


/3a(A*, A 2 ) 7^ 0 and /3&(A*, A 2 ) 0 0. 
Now, let /3 m (A^,A 2 ) / 0 and at the point /3(A]\A 2 ), 

9L(AJ,A 2>/ 0) 


<9/3„ 


= 0 , 


where 


-j n y v 

L{\t,\ 2 ,/3) = ^£{-£ PkXik + In [E ex p(E 


i= 1 fc=l 

V 


i&Ri 


k =i 


+Ai E Wfc|/3fc| + A 2 ^ 01 . 


k =1 


fc=i 


Then, 


-E^ + ± E 

n ^ n ■' 


1 A Ejefii x j« ex P (ELl &®jk) 


z=l 


n ^ EieiJ, ex P ( ELl PhXjh) 

+Aic5 a sgn{/3 a (Ai, A 2 )} + 2A 2 /3 a (A^, A 2 ) = 0. 


Therefore, 

&(Ai,A 2 ) = 

1 f 1 


9X„lr,E Xi “ „E 


1 A E j &Ri Xja exp ( ELl f j k X 3k) 


2A 2 l n ^ n ^ Ejefi, exp ( ELi PkX jk ) 

Similar to (13.31) . we get 

te,A 2 ) = 

1 f 1 1 Ejefli x jb ex P ( ELl PkXjk) (af\* \ 

7 TT-S - E Xi6 “ _ E^- T 51 -A 1 w 6 sgn{/ 3 6 (A 1 ,A 2 

2 A 2 l n “ E jeRi exp ( ELi &*jk) 

It follows from (13.21) . (13..‘ill and (|3.4h that 


(3.2) 


- A)w a sgn{/3 a (Ai, A 2 )} >.(3.3) 


.(3.4) 
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p a (\* l ,\ 2 )-p b (\* 1 ,\ 2 ) = — <j-E 


2A 2 n 


Z =1 L 


%ia %ib H - 


EjeRj x jb™P ( ELl PkXjk) 


Ejgfi, X ja exp ( ELl PkXjk) 

Ejefli ex P (ELl PkXjk) 


E^ ex P (ELi 

+ AiSgn{L(Ai, A 2 )} (a> b - ai 0 ) >. (3.5) 
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From Schoenfeld [9], we have 


^ir ■— %ir — %ir 


T,i & R i xir exp (x0) 


where r = 1, 2, • • • ,p. Therefore, (13.5ft is equivalent to 

n * * 

/3a(Ai, A 2 ) - A,(At, A 2 ) = —— r*a - r*fe) + ^sgn{A(At, A 2 )}(w b - w a ). 


(3.6) 


2nA 2 / 


i=l 


Hence, 

1 n 

4(A*i,A 2 )-A(A*i,A 2 ) <^E 

Since x a and are highly correlated, i.e., 


A* 

o x / , fia - r ib \ + -l-\u b - U a \- 

ZTlAo zAo 

i =1 


(3.7) 


E [x a xl] -A- 1. 

Then, we have for an individual i 

\xia ~ x ib \ -A 0, and |E(x ia ) - E(x ib )| ->• 0. 

Hence, 

\_Xia E(Xj a |7?j)j E(Xjfo|i?.j)] i 0. 

By (13.61) and (13.81) . we have 

|ha ~ rib | -t 0 and |E(r ia ) - E(r ib )\ -A 0. 

Since 

w a = (j/3(EA0“l) andchb = (\/3(EN)b 

we have 

\iu a — & b \ —t 0. 

It follows from (13.71) . (13.91) and (13.101) that 

7Aa*,a 2 ( a > b) -A- 0. 

The lemma holds. 


(3.8) 


(3.9) 


(3.10) 


□ 
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4. Oracle property 

In this section, we study the oracle property of the Adaptive Elastic Net method for 
the time-dependent Cox model. To emphasize the time dependence, we rewrite the model 
()2.1I) as follows 

h(t\X) = h 0 (t)exp{/3 T X(t)}, (4.1) 

where the covariate X(t) is time-dependent. Hence, the Adaptive Elastic Net estimator 
for this Cox model is: 


f 1 p P ] 

P(AEN) = argmin < - -/„(/?) + A* + A 2 ^ >, 

^ n k=1 k=1 ' 


(4.2) 


where l n (/3) is the partial log-likelihood function. 
Next we consider 


p p 

Qn(/3 ) = l n (P ) - nXl^LJklPkl - nA 2 ^/Sfe. 

k= 1 k=1 

We suppose that the real parameter /?q = (/3oi•> • • • , A) P ) T is sparse, and A = {A; : /?ofc / 
0} = {1, ■ ■ ■ ,po} with po < P- Hence /3o = (/3a, (3a c ) T with (3a c = 0, where 0 is the zero 
vector. Hence the corresponding estimator /3q takes the form of 

A) = ($a,Pa c ) ■ 

Let /(/Jo) be the Fisher information matrix. We also introduce the following notation. For 
any matrix B, 


B 


sup | bij\, 
ij 


and for any vector a 

a® 0 = 1, a 01 = a, a 02 = a ® a, ||a|| = sup |aj|, and |a| = (a 2 ) 

i 

where a (g> b is the p x p matrix ab T for any vectors a, b € R p . Before we state our main 
result in this section, similar to Fan and Li |6], we need the following conditions: 

(a) 



< oo, nA^ -A oo, and 


A 2 


n 


0 , asn-> oo; 


(4.3) 


(b) There exists a neighborhood H of /Jo such that for any /J € fi, 

E[ sup Y(t)X T (t)X(t) exp {/3 T X(t)}] < oo; (4.4) 

te[o, i],/3eo 


Moreover, 
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(c) for any /3 € D, we have 

soOM) = E[Y(t)exp{p T X(t)}], 
si(j3,t) = E[Y(t)X(t) exp {/3 T X(t)}] , and 
s 2 (/3, t) = E[Y(t)X(t)X T (t ) exp {(3 T X(t )}}, 


where Sj(/3, t), i = 0,1, 2, satisfies the following: 

(I) so((3,t), si(/3,t) and s 2 (/3, i) are uniformly continuous in t € [0, 1] for (3 € fh 

(II) s 2 (/3,t) and si(/3, f) are bounded on hi x [0, 1]. Moreover assume so(P,t) is 
positive and bounded away from zero on hi x [0, 1]. 


(d) Define: 


v(/3,t) 


S2(P,t) _ 02 
so(/3,t) 


with 

e = 51 (P,t) 

6 s 0 (P,t)' 

Then, the fisher information matrix 



v(P 0 ,t)s 0 (Po,t)ho(t)dt 


is hnite positive dehnite. 

In addition, the regularity conditions in Anderson and Gill [Ij are assumed in the whole 
section. Then, we have the following theorem. 

Theorem 4.1 If (a)-(d) hold, then the Adaptive Elastic Net estimator f3 for the time- 
dependent Cox model has the sparsity, i.e., 

E(Pa c = o) —> 1. 


Proof: We note that the partial log-likelihood function of the time-dependent Cox model 
is: 


n -i 

up) = y\ / p T Xi(t)dNi(t ) 

i=i J ° 



log [ X] Yi (*) ex P {( jTX ^ t )}] 

i=l 


(4.5) 


where N(t) = YJi=i N i(t)- 

Then, for each p in the neighborhood D of Pq, we have 

-{up) - upo)} = m+o p (^M), 

n V Wn / 


where O p {-) denotes convergence rate, and 

f(P) = f \(P ~ Po) T si(Po,t) -iog{ S °^’^ 
■Jo L 


s 0 (Po,t) 


}s 0 (Po,t) h 0 (t)dt. 


(4.6) 


(4.7) 
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Let p = A> + ^=, where ||w|| < C for some large enough constant C. According to the 
theorem 3.1 in Fan and Li [5], we know that for any e > 0, 



> 1 - e. 


Then 


Po - A) 


= O p (n 2). 


(4.8) 


By the Taylor expansion of l n (P), we have 

ACS) = A (/So) + 4(A))(/S - A)) + Op(v^ll/S - /Soil). 

By (14.61) and (14.91) . we obtain 

A(A> = A (A)) + nf{p) + O p (Vn||/S - /Soil), 

where /(/3) is given by (14.71) . 

On the other hand, by the condition (c), we have 


df(l 3) _ f 1 [si(A),A si(/3,t) 
Jo 


dp 


«o (/So A) s 0 (AA J 


so(/3o,t)ho(t)dt , 


and 


S 2 /(/3) _ f 1 \s 2 {P,t)s 0 (P,t) - si(P,t)sJ(P,t) 

Jo 


dpdp T 


[so (A A]' 


so(P 0 ,t)h 0 (t)dt. 


Hence 


$( q \ n S/(/3) d 2 f(p) 

/(/So) = 0, |/?=fl 0 = 0 , and - r = /(A)). 


9/3 IP_P0 ““ dpdp T 

From (14.111) and the Taylor expansion, we have 

m = -\{P - Po) T {l(Po) + o(l)} (9 - /So). 

Next we study (14.101) . Since 

dln(P) 


dPk 


= O p (y/n), k = p 0 + !,••• ,p, 


we have 


dQn(P) 

dPk 


Op{y/n) - n\\ ^ UkSgn{P k } - 2n\ 2 ^ At- 


Noting 


(4.9) 


(4.10) 


(4.11) 


(4.12) 


(4.13) 


ra 2 (|AcP - 0) = 0 P (1), 
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we obtain 


dQn(/3 ) 

d(3k 




0 P ( 1) 


(4.14) 


Combing (14.31) and (14.141) . we get that the sign of is determined only by where 

Pk € ( - Cn~ i, Cn"t), k = p 0 + 1, • • • , p. 

Therefore, 

||/3a -/?a|| = 0 p (n"^). 


(4.15) 


By (14.81) and (14.151) . 


Therefore 


Q n (P a, 0) = max i Q n {.PA,PA°)- 

\\PA*\\<cm 


T{Pa* = 0 ) -> 1 . 


The proof is completed. 

Next we study its asymptotic normality. 


□ 


Theorem 4.2 Suppose that the conditions (a)-(d) hold. Then the Adaptive Elastic Net 
estimator (3 for the time-dependent Cox model has the following asymptotic normality: 


V^{Pa-Pa)°n( o, if\p A )) 


D 


where -A denotes convergence in distributions. 

Proof: According to the proof of Theorem 14.11 we know that there exists Pa such that 
for k = 1, • • • , po 


dQn(P) 


dPk 




= 0. 


Let U n (P) be the score function of l n {P), that is 

n rl M 

i=J° 

Moreover, define 

rl ^=1 Yi(t)Xi(t)XT (*) exp{p T Xi(t)} 


r«>-r 


(4.16) 


(4.17) 


m = 


> o 


EEi Yi(t)exp{p T X l (t)} 


[EEi mXi(t) exp{/^(f)}E = i Yi(t)Xi(t) exp {p T X l (t)}? 
[E"=i ex P { P T Xi{t)}] 2 


dN{t). (4.18) 
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On the other hand, we have 


dQn(P) 


d/3 k 


dl n (P) 


P=0a,o) t dfa 


n\\ ^ WfcSgn {fa} - 2n\ 2 ^ fa 


P={Pa, o) t 

rc k 

According to the Taylor expansion, (14.19[i is equivalent to 

Ua( fa) - I\{P)0a - Pa) - n\l^2u k sgn{fa} - 2nA 2 ^/3fc = 0, 

k k 

where (fa, Po)- 

According to (14.311 and Andersen and Gill [I], we have 

1 


and 


U A (fa) -A- N ( 0 , Ii((3 A )), 


~h(P) —> Ii(Pa), 

n 


where U A (fa) is composed by the first po elements of U(fa), and h(/3 A ) is . 
submatrix of I (fa). 

On the other hand, if we assume 


y/nX{ —> Aq, 


then 


(PA - fa) = I 1 1 (/3) ~^=U A (fa) ~ Vn\l22^kSgn{fa} + O p ( 1). 

k 


y/n 

Note that from (14.221) . 

Then (14.2411 can be rewritten as: 


-y/n 

ni0(fa^I0(p A ). 


r l 


MPa - Pa) = I0(Pa) -j=U A (fa) - X 0 b 1 + O p ( 1), 

l y/n j 


where 


h\ = ^WfeSgn {p k }, k = 1, ■ 


,P o- 


According to the Slutsky’s Theorem, we have as n -A- oo, 

MPa - Pa) -> iv( - A 0 M(fa)h,M 

Hence, if Aq = 0, then (|4.26l) and (14.2711 can be simplified to 


1 


MPa - Pa) = fa 1 (Pa) Mu a (Po)\ + O p ( 1) 


n 


and 


MPa ~ Pa) [OJ/ 1 

respectively. The proof of the theorem is finished. 


, as n -A- oo, 


0. (4.19) 

(4.20) 

(4.21) 

(4.22) 
Po x po 

(4.23) 

(4.24) 

(4.25) 

(4.26) 

(4.27) 

(4.28) 

(4.29) 

□ 
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5. Empirical analysis 

Theorem 3.1 reveals that the Adaptive Elastic Net estimator for the Cox model enjoys 
the grouping effect. Next, we do an empirical analysis. 

The data come from a questionnaire, which studies the mobile phone cards’ usage of 
college students, xi, ■ ■ ■ ,xg and xio denote Sex, Grade, Position, Nation, Registered resi¬ 
dence, School address, Operators, The average monthly telephone charges, The quality of 
service and The average monthly living expenses, respectively. In addition, the study was 
recorded by years, and from 2007 to 2014. We finally got the 380 effective questionnaires. 
The training set number is 300, and the test set number is 80. Part of the data is listed 
in the following table 1. 


Table 1: Data sample 


No. 

Study times 

Status 

Xl 

X2 

%3 

X4 


x e 

X 7 

X 8 

Xg 

Xio 

1 

6 

censored 

2 

6 

1 

1 

2 

1 

1 

3 

2 

2 

2 

1 

loss 

2 

3 

1 

1 

1 

1 

1 

2 

2 

1 

3 

1 

loss 

1 

3 

2 

2 

1 

1 

2 

2 

2 

1 

4 

1 

loss 

1 

6 

1 

1 

1 

1 

2 

2 

1 

1 

5 

4 

censored 

2 

4 

2 

2 

1 

1 

1 

5 

2 

4 

380 

2 

loss 

2 

2 

2 

2 

2 

1 

1 

2 

3 

1 


Since x 8 and xio are medium correlation, we did the variable selection by the Lasso 
method, the Adaptive Lasso method(ALasso), the Elastic Net method(EN) and the Adap¬ 
tive Elastic Net method(AEN), respectively. The selected variables are in Table El and 
the coefficient estimators are in Table [3] 


Table 2: Statistics results for the real data 





Variables 

selected 

in the model 



Lasso 

X2 

X4 

X 5 

x 6 


xg 

xio 

ALasso 

X2 


x 5 



xg 

xio 

EN 

X2 

X4 

x 5 

xq 

x 8 

xg 

xio 

AEN 

X2 


x 5 

xq 

x 8 

xg 

xio 


Table 3: Coefficient estimators obtained by the real data 



Xl 

X2 

X 3 

X4 

X 5 

x& 

x 7 

x 8 

Xg 

xio 

Lasso 

0 

-0.092 

0.005 

-0.011 

0.125 

-0.246 

0 

0 

0.277 

0.012 

ALasso 

0 

-0.042 

0 

0 

0.096 

0 

0 

0 

0.018 

0.047 

EN 

0 

-0.112 

0.004 

-0.023 

0.124 

-0.284 

0 

0.019 

0.313 

0.026 

AEN 

0 

-0.120 

0.002 

0 

0.125 

-0.301 

0 

0.030 

0.329 

0.036 


From Table [3l we obtain the following: 

(1) These four methods do not select x\ and x 7 into the model, which shows that the 
respondents’ gender and operators have no effect to the usages of mobile phone card. 


11 













12 


( 2 ) X 2 is negative, which shows the senior students have a low probability of loss, xg is 
positive, which shows that the higher the students’ average monthly living expenses 
is, the greater the loss probability is. Similarly, xio is positive, which means that 
the worse the operator’s customer service quality is, the greater the loss probability 
is. It is consistent with the actual situation. 

(3) The coefficient estimators obtained by the AEN are the most close to the true model. 

(4) For xg and xio, both the Lasso and ALasso select xg only, while the EN and AEN 
can select both into the model, suggesting that these two methods can select the all 
strongly correlated variables into the model, and their estimators of coefficients are 
almost the same, which reflects the grouping effect. 

Theorems 3.1, 4.1 and 4.2 reveal that the Adaptive Elastic Net estimator for the Cox 
model enjoys the grouping effect and oracle property. Next, we show these properties 
through a numerical simulation. 

Let Xi ~ IV(0, l),i = 1,2,5,6,8,9,10. Moreover, let X 2 = X 3 ,X 6 = x^ and X 4 = 
2xi + \x 2 + ^ 7 X 3 . Then X 2 and X 3 are strongly correlated, so as xq and xj. Moreover, 
there exists the linear relationship between xi,X2,X3 and X4. Consider the following Cox 
model, 


10 

h(t) = h 0 (t) exp (^2 Pi X i) > 

2—1 

where t ~ U[ 0, 1]. The real parameter (5 is (—1, 2, 2,0, 1,1, 0,0, 0) T . Then we did the 

simulation with n = 1000 and p = 10 . 

We used the Lasso, ALasso, EN and AEN to do variable selection, respectively. Let 
A 2 = ^, 7 = 3, and the other parameters be selected by the cross validation method m- 
By using the Lars algorithm [2], we obtain the coefficient estimators. See Table |4j 


Table 4: Coefficient estimators obtained by numerical simulation 



xi 

X 2 

X 3 

X4 

X 5 

Lasso 

-0.99553 

3.89255 

0.09627 

0 

0.49304 

ALasso 

-0.99796 

3.89967 

0.09901 

0 

0.49936 

EN 

-0.99509 

1.99726 

1.99726 

0 

0.50021 

AEN 

-0.99947 

1.99951 

1.99951 

0 

0.49992 


x 6 

X 7 

Xg 

xg 

XlO 

Lasso 

1.88214 

0.05014 

0.00794 

0.00261 

0.00385 

ALasso 

1.90981 

0.03938 

0 

0 

0.00013 

EN 

0.99854 

0.99854 

0.00261 

0.00016 

0.00248 

AEN 

0.99976 

0.99976 

0 

0.00010 

0 


From Table |4j we get: 

(1) The coefficient estimators obtained by the AEN are the most close to the true model. 
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(2) None of the four methods selects X 4 into the model, which implies that these four 
methods are all able to deal with the collinearity problems. 

(3) We look at grouped variables (x 2 ,xf) and (xe,xj). The AEN and EN can select 
all the strongly correlated variables into the model, and the coefficient estimators 
of these two groups are the same. While the Lasso and ALasso select X 2 and xq, 
respectively. This indicates the AEN and EN enjoy the grouping effect. 

(4) We focus on the variables xs, £9 and x\q. The ALasso and AEN can get more 
accurate estimators for the estimation of zero variables than the other two methods 
do. This indicates that the AEN has the oracle property. 

6. Conclusion 

In this paper, we study the Adaptive Elastic Net method for the Cox model. We show 
that it has the grouping effect and oracle property. These two properties are showed by 
an empirical analysis and a numerical simulation. In these examples, the Adaptive Elastic 
Net and Elastic Net can make up for the lack of the Lasso and Adaptive Lasso, and can 
select all the strongly correlated variables into the model, i.e., the Adaptive Elastic Net 
method for the Cox model enjoys the grouping effect. In addition, the Adaptive Elastic 
Net method for the Cox model has the oracle property. 
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